20
example with the BLAST software) is thus a particularly frequently used and popular
bioinformatics method for identifying genes or proteins in the genome.
Explain the BLAST algorithm (hint: it is sufficient to describe how the algorithm can
become so fast). Also describe its usefulness for biology. If both are still unclear, simply
refer to the chapter again.
Task 1.5
Develop a simple program that examines a sequence for possible sequence similarities in
a database (hint: enumerate what parts this program would consist of).
Task 1.6
Which of the following statements about BLAST is correct (multiple answers possible)?
A. BLAST = Basic Local Alignment Search Tool.
B. BLAST = Basic Low Alignment Search Tool.
C. BLAST is an algorithm for finding locally similar sequence segments in a database.
D. BLAST uses a heuristic search and here the two-hit method (2-hit method).
Task 1.7
Example: The sequencing of a diseased person has revealed the following protein sequence:
>unknownsequence 1.7
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQIL
IEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF
Which BLAST algorithm would you choose for your patient sequence?
A. blastn.
B. blastp.
C. blastx or tblastx.
D. tblastn.
Task 1.8
You now want to know exactly which virus the person has contracted. Perform a BLAST
search yourself using the protein sequence (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
Which of the following statements is correct (multiple answers possible)?
A. The sequence is almost certainly the pol protein and protease of the HIV-1 virus.
B. The unknown sequence shows low similarity to the pol protein and protease of the
HIV-1 virus.
C. When searching for a sequence that is as similar/identical as possible, a match
should always have as large an E-value as possible and a low identity.
D. The E-Value (expected value) shows how likely it is that the hit will be found again
in the database with a similar or better score.
1 Sequence Analysis: Deciphering the Language of Life